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ABSTRACT 



Context. Several approaches to estimate frequency, phase and amplitude errors in time series analyses were reported in the literature, 
but they are either time consuming to compute, grossly overestimating the error, or are based on empirically determined criteria. 
Aims. A simple, but realistic estimate of the frequency uncertainty in time series analyses. 

Methods. Synthetic data sets with mono- and multi-periodic harmonic signals and with randomly distributed amplitude, frequency 
and phase were generated and white noise added. We tried to recover the input parameters with classical Fourier techniques and 
investigated the error as a function of the relative level of noise, signal and frequency difference. 

Results. We present simple formulas for the upper limit of the amplitude, frequency and phase uncertainties in time-series analyses. 
We also demonstrate the possibility to detect frequencies which are separated by less than the classical frequency resolution and that 
the realistic frequency error is at least 4 times smaller than the classical frequency resolution. 

Key words, methods: data analysis - methods: statistical 



1. Motivation 

In the frequency analysis of time series, a realistic estimate of the 
amplitude, phase and frequency uncertainties can be of special 
interest. Few examples are: 

- The comparison of frequencies derived for simultaneously 
observed stars allows identifying instrumental signal, if the 
frequencies occur in different data sets but within the fre- 
quency uncertainty. 

- One needs to know the observed frequency errors in order to 
assess the quality of a fit of models to the observations. 

- For mode identifications based on amplitude ratios or phase 
differences from multi-color photometry one also needs a 
reliable estimate for the frequency error. 

A combination of Fourier and le a st-squ ares fitting al- 
gorithms (like Sig Spec by iReegenl I20071 Perio d04 by 



iLenz & Bregel200a or CAPER bv lWalker etatll2005h is a fre- 
quently used method for determining frequencies, amplitudes 
and phases of harmonic signals. For a time series consisting of a 
perfect sine wave and white noise, the frequency error is deter- 
mined by the total time base of the data set and the signal-to- 
noise rati o (SNR) of the corresponding amplitu de in the Fourier 
spectrum. iMontgomerv & O'Donoghuel dl999l) defined the am- 
plitude, phase and frequency errors as 
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based on an analytical solution for the one-sigma error of a 
least-squares sinusoidal fit with a rms of <x(m). The total num- 
ber of data points, the total time base of the observations, the 
signal amplitude, phase and frequency are N, T, a, <p and /, re- 
spectively. Hence the last term in Eq.|2]and[3]represents SNR 1 
in the time domain. We want to mention that the time domain 
SNR in these relations is not equal to the commonly used SNR 
in the amplitude spectrum (peak amplitude divided by the aver- 
age amplitude in a given frequency rang e) and it scales to the 
time domain SNR by a factor of « y/n/N. Reege^ (120071) shows 
that this scaling cannot be applied uniquely to the full frequency 
range and that systematic effects have to be taken into account if 
an exact description of frequency-domain errors is needed. 

However, in reality an intrinsic signal is superposed not only 
by white noise (e.g. due to photon statistics) but also by corre- 
lated noise (e.g. atmospheric scintillation for ground-based data) 
or non-Gaussian distributed noise (e.g., introduced by the data 
reduction). Even the star itself can contribute correlated noise, 
for example due to granulation. All these noise sources increase 
the real frequency uncertainty which leads to the unsatisfying 
situation that in the literature several empirical parameters can 
be found which tune the frequency error to personal experience. 

People quite often use the Rayleigh frequency resolution 
(T~ l ), defined by the total time base of the data set, which is 
in most cases a dramatic overestimation of the real uncertainty. 
To access the uncertainties of the fitting parameters for the time 
series analysis, it turned out to be an appropriate way to perform 
simulations with the actually analyzed data set, as it is done by 
Monte Carlo si mulations in Perio d04 or by bootstrap simulations 
in CAPER (see lRowe et a l. 2006 for details). This approach has 
the disadvantage that the simulations can be very time consum- 
ing especially if the data sets are big and/or include plenty of 
signal components. 
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Fig. 1. top: Frequency error cr(f) normalized to the Rayleigh 
frequency resolution given by the data set length T versus the 
spectral significance. Given are average values in bins (repre- 
sented by the horizontal bars) as a result of a numerical simu- 
lation of 42 597 synthetic data sets including a single sinusoidal 
signal with random frequency, amplitude, phase and white noise 
added. Vertical bars indicate the +4<x (and -lcr) distribution il- 
lustrating that the heuristically determined frequency error cri- 
terion (solid black line) represents a good approximation for 
the upper limit of the frequency uncertainty which is at least 
by a factor significance 1 ^ 2 smaller than the frequency resolution 
J* -1 . Cross symbols correspond to frequency errors derived from 
the comparison of real ground based data with high-precession 
space photometry of the same stars. For an explanation of the 
grey line see last but one paragraph of Sec. 2.1. middle: Relative 
amplitude error versus the spectral significance. The solid line 
indicates the upper limit for the relative amplitude error given in 
this work, bottom: Phase error (in units of 2n) versus the spec- 
tral significance. The solid line shows the "Montgomery phase 
error" converted to spectral significance, all panels: The dashed 
lines represent the analytically determined one sigma error of a 
sinusoidal least-squares fit (Montgomery & O'Donoghue 1999). 



2. Mono-periodic signal 

To quantify the effect of white noise on the frequency determina- 
tion of a coherent mono-periodic signal, a numerical simulation 
was performed for 42 597 synthetic data sets. Each data set con- 
sists of 10000 data points uniformly distributed over 10 days 



and includes two components: a single sinusoidal signal with 
random (uniformly distributed) frequency, amplitude and phase, 
and Gaussian distributed scatter with a random (uniformly dis- 
tributed) amplitude (FWHM of the Gaussian random-number 
generator). All input parameters are independent of each other. 

2.1. Frequency error 

For the frequency analysis, the routine SigSpe$\ (Reegen 2007) 
was used. It is an automatic program to detect periodic signals in 
data sets and relies on an exact analytical solution for the prob- 
ability that a given DFT (Discrete Fourier Transform; Deeming 
1 1975b amplitude is generated by white noise. Its main advantage 
to commonly used signal-to-noise ratio estimates is its appropri- 
ately incorporated frequency and phase angle in Fourier space, 
and time-domain sampling, hence using all available informa- 
tion instead of mean amplitude only. The SigSpec spectral sig- 
nificance is defined as the logarithm of the inverse False-Alarm 
Probability that a DFT peak of a given amplitude arise from pure 
noise in a non-equidistantly spaced data set. 

On average, a SNR of 4 corresponds to a spectral signifi- 
cance value of 5.46. This means that an amplitude of four times 
the noise level would appear by chance at a given frequency in 
one out of 10 5 45 cases, assuming white noise. 

Fig. Q] shows the absolute deviation - scaled to the data set 
length - between the input frequency and the SigSpec frequency 
as a function of the spectral significance. Given are average val- 
ues in bins of spectral significance (indicated by the horizontal 
bars). Not surprisingly, there is a clear dependency of the fre- 
quency error on the significance (or SNR). Vertical bars indicate 
the +4cr (and - lcr) distribution of our simulation. Obviously, the 
real frequency error quite often (^ 30 %) exceeds the frequency 
error given by Eq.[3] and which is indicated by a dashed line in 
Fig.Q] However, we could heuristically define a frequency error 
criterion (solid black line in the top panel of Fig.Q]) as 

1 n ■ loge 

" (/)Ka = T^sTg " 4-T-SNR' (4) 

representing a good approximation for the upper limit of the fre- 
quency uncertainty and showing that the frequency uncertainty 
is less than the frequency resolution T~ l , at least by a factor of 
■\fsig. Only 4 out of 42 597 simulations result in a frequency er- 
ror exceeding the so defined upper frequency error limit. Being 
aware that a simulation need not reflect the reality, we added the 
frequency error of real observations into Fig. Q] (large dots) de- 
rived from the comparison of ground based data with long-term 
high-precision space photometry (MOST) of the same stars. We 
have to mention that plotting the frequency error as a function of 
the signal frequency (or phase) reveals no correlation between 
these quantities (in order to be independent from the spectral sig- 
nificance, synthetic data sets with a fixed SNR have been used). 

The deviation from a linear relation at high significances in 
the log-log scale of Fig. Q] is due to a distortion of the signifi- 
cance scale which is explained in Fig. [2] where the SNR in the 
amplitude spectrum is plotted versus the spectral significance for 
frequencies determined from the synthetic data sets. For spectral 
significances below some hundred the significance is roughly 
equal to pt ■ lo g e)/4 times the SNR 2 in the amplitude spectrum 
(lReegenl2007l) . Only for extremely significant signals one has to 
take into account that the noise calculation for the SNR and the 

1 Significance Spectrum, 
http://www.astro.univie.ac.at/SigSpec/ 
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spectral significance is different. Whereas the SNR is based on 
the average amplitude in a Fourier spectrum after prewithening 
the signal (corresponds to the rms residual), the spectral signifi- 
cance is based on the rms scatter of the time series including the 
signal. With other words, a pure signal without noi se has an infi- 
nite SNR but still a finite spectral significance (see Reegen 2007 
for details). The grey line in Fig.Q]takes this effect into account. 

In order to explain the difference between the upper fre- 
quency error limit and the "Montgomery frequency error", we 
interpret the latter to be the statistically expected value for the 
frequency uncertainty corresponding to the average values in the 
spectral significance bins of our simulation. Finally, we have to 
mention that the frequency error distribution of our simulation 
(for fixed spectral significance) is neither Gaussian nor symmet- 
ric which makes it very difficult to define an analytical average 
value and scatter for the frequency uncertainty. 



2.2. Amplitude error 

Whereas the absolute amplitude error only depends on the time 
series rms scatter (see Eq.[T]l, the relative amplitude error 
should be correlated with the signal's spectral significance (or 
SNR). The middle panel of Fig. Q] shows the relative amplitude 
error (deviation between the input amplitude and the SigSpec 
amplitude relative to the SigSpec amplitude) versus the spectral 
significance of our simulated white noise data sets. The dashed 
line indicates the relative amplitude error based on the absolute 
"Montgomery amplitude error" representing the statistically ex- 
pected value. According to our upper limit for the frequency un- 
certainty, we could again define an upper limit for the amplitude 
error of a sinusoidal least-squares fit as follows, 



<r(a) ¥ 



sig s/n ■ \oge SNR' 
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indicated as solid line in the middle panel of Fig. Q] However, the 
upper limit for the amplitude error is not as good defined as for 
the frequency error. But still « 98% of the determined amplitude 
errors are smaller than the given limit. 

2.3. Phase error 

The bottom panel of Fig. Q] illustrates the absolute deviation be- 
tween the input phase and the SigSpec phase versus the spectral 
significance of the 42597 synthetic data sets. Again, the dashed 
line indicates the phase error for a sinusoidal least-squares fit 
according to Eq.|2] Contrary to the "Montgomery frequency er- 
ror" corresponding to the statistically expected value for the fre- 
quency uncertainty, the "Montgomery phase error" is consistent 
with an upper limit for the real phase error. All, but 4 numer- 
ically determined phase errors are below the given limit. Eq.|2] 
based on the time-domain SNR is converted to spectral signifi- 
cances (and frequency-domain SNR) as follows, 
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which is indicated by a solid and a dashed line in the bottom 
panel of Fig. Q] 
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Fig. 2. Amplitude spectrum signal-to-noise ratio (SNR) versus 
spectral significance for frequencies determined from 42597 
synthetic data sets. The deviation from the linear relation (gray 
line in the log-log plot) at high significances is due to different 
noise estimate for SNR and spectral significance. Whereas the 
SNR is based on the rms scatter of the time series after prewith- 
ening the signal, the significance is based on the rms scatter of 
the time series including the signal. 



3. Multi-periodic signal 

Usually the smallest frequency separation of two independent 
signals in a data set which can be determined separately is called 
frequency resolution. 

For two signals with comparable amplitudes, a frequency 
separation corresponding to the Rayleigh frequency resolution 
(T _1 ) results in a local minimum between the two peaks in the 
amplitude spectrum. Closer frequencies produce an asymmetric 
peak whereas the peak maximum is roughly at the amplitude- 
weighted mean of the frequencies. After prewithening the signal 
(corresponding to the subtraction of a sca led spectral window 
at the given frequency, Rob erts et al.| [T987) some signal will be 
still left in the amplitude spectrum. With other words, it should 
be possible to determine frequency, amplitude and phase of sig- 
nals separated in frequency by less than the frequency resolution. 
Hence, the uncertainties of these parameters should be less than 
given by the Raleigh criterion. 

To quantify this uncertainty, a numerical simulation was per- 
formed for ~50 000 synthetic data sets now including two sig- 
nals with random frequency, amplitude and phase for the first 
component. The second signal has a frequency randomly sep- 
arated from the first one between and 5 times the Rayleigh 
frequency resolution (T~ l ), a random amplitude between 0.1 
and 1 times the amplitude of the first one and a random phase. 
Gaussian distributed scatter with a random amplitude was added 
to the synthetic data. 

Fig. [3] shows the average absolute frequency error in bins 
of the spectral significance of the stronger signal for different 
ranges of the frequency separation A/ (in units of the Rayleigh 
frequency resolution) of the two input signals. The presence of 
a second signal separated by less than the Rayleigh frequency 
resolution limits the frequency uncertainty of the stronger signal 
to (4 • T) (see dashed lines in Fig.O if the spectral significance 
exceeds a value of 16 (this is where both criteria give the same 
frequency error). We have to note that this limit is again purely 
heuristically determined. For a second signal, separated by more 
than 3 times the Rayleigh frequency resolution, the frequency 
uncertainty of the stronger signal is limited by the frequency er- 
ror criterion for a mono-periodic signal given by Eq.|4](see bot- 
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Fig. 3. Same as top panel of Fig.Q]now including two sinusoidal 
signals illustrating average frequency errors cr(f) (normalized to 
the Rayleigh frequency resolution) of the stronger signal (first 
detected in the prewhitening sequence) in bins of the spectral 
significance along with +4cr (and -lcr) environments in the bins. 
The panels refer to different ranges of the frequency separation 
Af (in units of the T' 1 ) of the two input signals. The solid line 
indicates the upper frequency error limit for mono-periodic sig- 
nals. The dashed line corresponds to the heuristically determined 
upper frequency error limit for close frequencies and is equal to 
(4-7T 1 . 



not be Gaussian. As pointed out by Reegen ( 2007), the spectral 
significance does not depend on the probability distribution as- 
sociated to the noise, and the only precondition is uncorrelated- 
ness of consecutive data points. There has to be mentioned that 
amplitude, frequency and phase errors derived from spectral sig- 
nificances are only comparable to errors derived from SNR if the 
time series is well sampled (e.g. continuos space observations). 
Contrary to spectral significance based errors, SNR based error 
estimations (time-domain as well as frequency-domain) do not 
take into account the data sampling and can yield in a crude un- 
derestimation of the errors for "bad" sampling like it is more or 
less always the case for single-site ground based observations. 

We have shown that the p hase error defined by 
iMontgomerv & O'Donoghuel (1 19991) is consistent with our 
simulations. 

Furthermore we have shown that the determination of fre- 
quency pairs closer than the Rayleigh frequency resolution is 
possible and that the resulting frequency error is still 4 times 
smaller than the Rayleigh frequency resolution. However, our 
simulation does not say anything about the reliability of close 
frequency pairs in general. It tells us about the frequency un- 
certainty of a peak if, after prewhitening this peak, a second 
significant peak is present. It tells us that peaks do not influ- 
ence each other's frequency determination if they are separated 
in frequency by 3 times the Rayleigh frequency resolution. For 
closer peaks the frequency uncertainty is at least 4 times below 
the Rayleigh resolution even for peaks within the Rayleigh res- 
olution. 
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torn panel in Fig. [3}. There seems to be a smooth transition for 
1 < Af < 3 (middle panel). 

Remarkably, only 13 out of -50000 (~ 0.026%) numeri- 
cally determined frequency errors do not satisfy the following 
criterion. 

If a second signal is present within about three times the 
Rayleigh frequency resolution and spectral significance > 16 the 
upper limit for the frequency error is 



0-(/) Ka = 



1 

4T' 



In all other cases the frequency error is smaller than 

o-(/) Ka = - * 

T- yjsig 

corresponding to Equ. (4). 
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4. Conclusions 

Based on extensive simulations, we have shown that there is an 
upper limit for the amplitude and frequency error in time series 
data analyses. Compared to the statistically expected value fo r 
the uncertainties given by Montgomery & O'Donoghue (1999), 
our upper limits cover the possible error due to white noise and 
leaves even room for additional error sources like atmospheric 
scintillation. A major advantage of calculating amplitude, fre- 
quency and phase errors in terms of spectral significance rather 
than signal-to-noise ratio is that the time-domain noise need 



